Methods, systems, and computer program products for summarizing operational behavior of a computer program

ABSTRACT

Methods, systems, and computer program products for summarizing operational behavior of a computer program are disclosed. A method for summarizing the operational behavior of a computer program may include executing a computer program in a mode that allows control over execution of the computer program. Execution of the program is paused at predetermined locations corresponding to each instruction in the computer program. For each location, contents of a call stack containing function calls made by the program that have not yet returned are recorded. For each function call in the call stack, information regarding conditions under which the function was called are recorded. Execution of the program is resumed until the next pause location is encountered.

TECHNICAL FIELD

The present invention relates to the analysis of a computer program toillustrate the inner workings of the program. More particularly, thepresent invention relates to methods, systems, and computer programproducts for summarizing operational behavior of a computer program toproduce output that is concise and easily understood.

BACKGROUND ART

Computer programs are often comprised of thousands of lines of code andhundreds of subroutines. As the program is developed it becomesincreasingly more difficult to comprehend. As a result, softwareengineers use diagrams that help to explain the components that comprisea program and how those components interact.

In object-oriented software design, there are two basic logical views ofa system, a static view and a dynamic view. A static view answers thequestion: “What?”. For example, “what are the classes that are used bythe program?” In contrast, a dynamic view answers “How?”. For example,“how does the system work?” An object-oriented example of a static viewof a system in object-oriented design is called a class diagram. A classdiagram illustrates the relationships between a set of classes. Anobject-oriented example of a dynamic view is a sequence diagram. Asequence diagram illustrates how different objects (instances ofclasses) interact under a given scenario.

Software engineers use specialized design tools to aid in thedocumentation of the design of the systems they develop. There are manymature software design tools on the market. These tools vary insophistication. Some tools are mere drawing tools that allow a user tographically illustrate the design of a system using standard symbolsthat are part of the design methodology specification. Other designtools are more sophisticated because they allow the model and the sourcecode to stay synchronized, a process known as round-trip engineering.

The current trend in software design tools is toward automating theoften tedious work of drawing design diagrams and to achieve closesynchronization with the software source code as changes are made to thesystem. The market is saturated with software design tools that focus ongenerating the static view of a system, while not providing anyassistance in generating the dynamic view. This is largely because it isquite simple to reverse engineer the static design of a system byanalyzing the source code. Generating a dynamic view of the system ismuch more difficult because it requires analyzing a running program.Some products analyze the source code in an attempt to describe theprogram flow of a system. The main problem with this approach is that itis impossible to predict the exact behavior of a system without actuallyrunning the program. This is because the path of execution dependslargely on conditions that are not known until a program is executed.

Tools that are used to analyze the behavior of a system are referred toas profilers. The main purpose of a profiler is to analyze a runningprogram in order to isolate bottlenecks and to improve overallperformance. Some diagramming tools utilize profilers in order to buildsequence diagrams or call trees. These tools provide some insight intothe behavior of a system, but they usually produce output that wouldconsume reams of paper if printed. Typical output includes a listing ofthe results of execution of each statement, similar to output producedmanually by inserting I/O statements to print variable values after eachprogram statement. The reason for such voluminous output is thatprofilers lack the ability to summarize the execution flow in such a waythat can be easily understood by a human.

Some profilers produce sequence diagrams. A problem with currentprofiling tools that produce sequence diagrams is that the generateddocumentation lacks critical detail about the execution flow. Two veryimportant aspects of execution flow are execution looping andconditional execution. Current profiling products do not depict loopingor conditional execution. However, most programs are composed mainly ofexecution loops and conditional execution statements. As a result, it isnot possible to truly understand why a program is behaving a certain wayusing current profiling tool technology. A solution is required that cansummarize program execution into a sequence diagram that illustrates thenature of the program flow. This can be accomplished by tracking loopingand conditional execution and annotating the resulting sequence diagram.However, such tracking and annotating have only been performed manuallyby software engineers through analysis of thousands of lines of sourcecode and program output.

In light of the problems with current program analysis tools, thereexists a long-felt need for improved methods, systems, and computerprogram products for summarizing operational behavior of a computerprogram.

DISCLOSURE OF THE INVENTION

The present invention includes methods, systems, and computer programproducts for summarizing the execution flow of a computer program.According to one method, a computer program is executed in a mode thatallows control over execution of the program. Execution of the programis paused at locations corresponding to instructions in the computerprogram. For each location, contents of a call stack containing functioncalls made by the program that have not been returned are recorded. Foreach function in the call stack, conditions under which the function wascalled are recorded. The conditions may include the sequence offunctions that resulted in the current function being called and whetherthe function is executing in a loop. In implementation, the contents ofthe call stack may be recorded on a data structure, referred to hereinas a shadow stack. A new shadow stack instance may be created for eachbreakpoint location. A summarized call tree may be used to storerelationships between calls for each instance of the shadow stack.Computer program output may be presented to the user in a summarizedformat, such as a sequence diagram. Post processing of intermediate ormachine code corresponding to the computer program may be performed toadd notation to the sequence diagram that indicates guard conditions forloops and conditionally executed blocks of code.

Although the methods and systems described herein may produce a sequencediagram for display to the user, the present invention is not limited toproducing sequence diagrams. Alternative diagrams that may be producedby the methods and systems described herein to illustrate operationalbehavior of a computer program include behavioral views of the system,such as collaboration diagrams, state diagrams, and object interactiondiagrams.

One aspect of the invention may include analyzing the execution flow ofa computer program by monitoring function/method calls made by aprogram. The analysis may include detecting when the program is in anexecution loop or when a program has conditionally executed a block ofcode. This information is used to summarize the behavior of a program byexpressing function call sequences with loop and conditional executionnotation. One problem that is solved by the present invention is how toidentify where the execution loop actually exists. One exemplaryimplementation described herein determines the origin of an executionloop by combining the use of the shadow stack, a local loop counter, andthe summarized call tree.

In one exemplary implementation, the present invention includes asequence analysis engine (SAE). The SAE utilizes debugger services inorder to examine and record the execution flow of a computer program. Byusing debugger services, it is meant that the SAE uses services providedby a debugger application programming interface (API), such as theWINDOWS® debugger API. While most debuggers are used by computerprogrammers to isolate and fix bugs in computer programs, one exemplaryimplementation of the present invention includes an approach thatautomates the use of debugger services to control, examine, and recordthe execution of a computer program. The SAE utilizes common debuggerservices to inspect the state of an executing program, to set and cleardebug breakpoints, to single step into/over computer instructions, tocontrol the execution of threads and processes, and to access debugsymbols associated with the target program.

An alternative approach to using debugger services involves usingprofiler services to track method entry and exit events. Thedisadvantage of this approach is that every function call incursoverhead because profilers track all method calls in an application.Using debugger services is more efficient because debug breakpoints areused to focus analysis only on those classes/methods that are ofinterest to the user.

In an implementation that utilizes profiler services, the SAE may useprofiler call back functions for each method call in a program.Profilers typically require special instructions to be placed at thebeginning of each method. These instructions are designed to interruptthe execution of the target program and allow a monitoring service totake control. In one exemplary implementation of the present invention,the SAE may be called by the profiler for each method call in a programbeing monitored. The SAE may then record the function call and inspectthe target application memory, call stack, and registers to determinewhether the function call is being conditionally executed or whether thefunction call is part of an execution loop.

Although using profiler services is one possible method for analyzingthe operational behavior of a computer program, using profiler servicesis less efficient than using debugger services because using profilerservices does not enable a user to selectively enable and disablemonitoring of user-specified functions. A profiler requires that eachfunction be analyzed. Requiring that each function be analyzed increasesunnecessary processing in analyzing a computer program.

Another advantage to using debugger services over profiler services isthat debugger services allow the sequence analysis engine to dynamicallyexplore new interactions between functions using the single stepdebugger service combined with breakpoint service. The ability todynamically explore new interactions between functions is referred toherein as auto-discovery mode. Auto-discovery mode is not possible usingprofiler services because profiler services do not allow single steppingservices or breakpoint services.

A GUI application that allows a user to configure inputs for the SAE,control the operation of the SAE (start, stop, pause), and to view andmanipulate the outputs from the SAE may be provided. The SAE interactswith the debugger to control the target application and examine/recordits execution using input provided via the GUI. In an alternateimplementation, the GUI may be omitted. In such an implementation, theSAE may be executed by the user from a command prompt. With thisapproach, the user may be responsible for manually editing the SAE inputfile. The SAE may produce the analysis results to a text file that couldbe viewed by the user.

The methods and systems for analyzing operational behavior of a computerprogram described herein may be used in a variety of softwaredevelopment and testing scenarios. One such scenario is referred to asextreme programming. Rapid application development methodologies, suchas extreme programming, advocate developing computer programs byskipping formal analysis and design and jumping immediately intoprogramming. This approach encourages programmers to constantly refactortheir programs until the desired solution is obtained. Unfortunately,this approach will leave very little design artifacts for other softwaredevelopers who later must maintain or add features to the program. Asoftware tool, as described herein, automatically generates concise,easy-to-understand, sequence diagrams that explain how certain aspectsof the program function. This tool fits perfectly into the extremeprogramming paradigm because it allows the software engineers to focuson developing the program while automatically producing up-to-datedocumentation for communicating the design of the system.

Another application of the methods and systems described herein isbehavioral model verification. Large software firms, especially thoseproducing software for government agencies, are required to followformal design methodologies. During the detailed design phase ofsoftware development, these firms produce design documentation for boththe static and dynamic aspects of the system. After the design has beenreviewed and approved, the actual software development begins. Oftentimes, the software development is performed by an entirely differentgroup of people. As a result, there is often a disconnection between thedesigner's intent and the actual implementation. The formal softwaredevelopment process requires model verification after the softwareimplementation is complete. At this stage, source code reviews are heldand a determination is made as to whether the program has beenimplemented according to the design. Reviewing source code can be usefulin verifying the static design of a system; however, it does not addressthe dynamic aspects of the design. A tool, as described herein, may beused to perform model verification of the behavioral aspects of thedesign.

Yet another application of the methods and systems described herein isautomatic generation of computer program behavioral documentation foruse in maintaining legacy software. The software life-cycle of acomputer program typically includes the following: requirements,analysis, design, implementation, testing, installation, operation,maintenance, and retirement. Quite often, software engineers involved inthe early design of a system are not the same individuals that areresponsible for maintaining the system. In fact, in many softwareprojects, by the time a product reaches the maintenance phase of thesoftware life-cycle, the original designers of the software are nolonger available, having either changed projects or in some cases, jobs.This presents a problem when a software maintenance engineer isattempting to fix a bug in a system that has out-dated documentation.The maintenance engineer must spend countless hours pouring over sourcecode, debugging the application, or asking others for helpful insight.As described above, tools already exist that generate up-to-datedocumentation on the static design of a system. Unfortunately, staticdesign documentation alone often cannot provide the maintenance engineerwith enough information to isolate and fix program bugs. A tool that canautomatically generate documentation that describes the behavior of asystem would have a profound impact on reducing the cost of maintenancein this scenario. The maintenance engineer could run the tool underdifferent conditions to generate behavioral diagrams that can becompared and used to isolate the problem. The methods and systemsdescribed herein provide such a tool.

The methods and systems described herein may include an approach forgenerating a dynamic view of a computer program that is concise andeasily understood by the user. Such a tool has utility in rapidapplication development, formal software development, and in reducingthe cost of maintaining legacy software systems.

Accordingly, it is an object of the invention to provide methods,systems, and computer program products for summarizing the operationalbehavior of a computer program.

It is another object of the invention to provide methods, systems, andcomputer program products for identifying loops and conditionalexecution in computer programs and displaying the loops and conditionalexecution to a user in a summarized format.

Some of the objects of the invention having been stated hereinabove,other objects will become evident as the description proceeds when takenin connection with the accompanying drawings as best describedhereinbelow.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the invention will now be explained withreference to the accompanying drawings of which:

FIG. 1 is a sequence diagram that does not include notation forillustrating conditional program execution or looping;

FIG. 2 is a diagram of a computer program that contains execution loopsand conditional statements;

FIG. 3 is a sequence diagram illustrating execution of the program inFIG. 2 without notation for conditional execution or looping;

FIG. 4 is an example of a sequence diagram including loop andconditional execution notation that may be automatically produced by amethod for summarizing operational behavior of a computer programaccording to an embodiment of the present invention;

FIG. 5 is a summarized call tree illustrating relationships betweenmethod calls;

FIG. 6 is a summarized call tree illustrating an execution loop bymaintaining loop counters for each called method;

FIG. 7 is a diagram illustrating source code and corresponding assemblylanguage code used for detecting conditional execution of a functioncall using a sequence analysis engine according to an embodiment of thepresent invention;

FIG. 8 is a flow chart illustrating exemplary steps for post-processingof intermediate code to generate conditional and loop notation for asequence diagram according to an embodiment of the present invention;

FIG. 9 is a unified modeling language (UML) class diagram illustratingrelationships between exemplary classes used to summarize operationalbehavior of a computer program according to an embodiment of the presentinvention;

FIG. 10 is a block diagram illustrating exemplary relationships betweena shadow stack and a summarized call tree used for summarizingoperational behavior of a computer program according to an embodiment ofthe present invention;

FIG. 11 is a flow chart illustrating exemplary steps that may beperformed by a sequence analysis engine in summarizing operationalbehavior of a computer program according to an embodiment of the presentinvention;

FIG. 12 is a flow chart that illustrates an exemplary process that maybe performed by a sequence analysis engine in updating a shadow stackand a summarized call tree according to an embodiment of the presentinvention;

FIG. 13 is a flow chart illustrating exemplary steps that may beperformed by a sequence analysis engine in detecting execution loops andstoring loop counts according to an embodiment of the present invention;

FIG. 14 is a block diagram illustrating an exemplary overallarchitecture of a system for summarizing operational behavior of acomputer program according to an embodiment of the present invention;

FIG. 15 is a block diagram illustrating exemplary debugger services thatmay be used by a sequence analysis engine according to an embodiment ofthe present application;

FIG. 16 is a block diagram illustrating data flow and exemplaryrelationships between components of a system for summarizing operationalbehavior of a computer program according to an embodiment of the presentinvention;

FIG. 17 is a flow chart illustrating exemplary steps for initiatinganalysis of a target application according to an embodiment of thepresent invention; and

FIG. 18 is a flow chart illustrating exemplary steps performed when auser stops analysis of a target application according to an embodimentof the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The following terms and definitions are used in explaining details ofembodiments of the invention:

API—stands for application programming interface, a standard used bycomputer programmers to allow operating systems and softwareapplications to understand one another.

breakpoint—a place in a source code program that stops the debuggerduring program execution. Breakpoints aid in the testing and debuggingof programs.

C++—an object-oriented programming language based on the C language.

call or function call—an expression that moves the path of executionfrom the current function to a specified function and evaluates to thereturn value provided by the called function.

call stack—the list of procedures and functions currently active in aprogram.

call tree—a data structure used to record a computer program's functioncall sequence. Also, a display that documents function usage hierarchy.

class—one of the key concepts in object-oriented programming, a class isthe most general kind of user-defined type, defining both the stateinformation used by objects of the class (data members) and theirbehavior (member functions). Classes may be related to one another viainheritance relationships, where base classes define portions of theinterface and/or implementation of derived classes.

class diagram—a diagram that shows a collection of declarative (static)model elements, such as classes, types, and their contents andrelationships.

collaberation diagram—a diagram that shows object interactions organizedaround the objects and their links to each other. Unlike a sequencediagram, a collaboration diagram shows the relationships among theobjects. Sequence diagrams and collaboration diagrams express similarinformation, but show it in different ways.

conditional statement—in a programming language, a statement (forexample, the if statement) that evaluates one or more variables orconditions and uses the result to choose one of several possible pathsthrough the subsequent code.

debug symbols—information used by a debugger to link machineinstructions to higher level source code and to display variable namesand type information.

debugger—a software tool that is used to detect the source of program orscript errors by performing step-by-step execution of application codeand viewing the content of code variables.

disassemble—to transform machine codes into assembler language.

function—a specialized group of statements used to encapsulate generalor program-specific tasks.

guard condition—a condition that must be satisfied in order to proceedexecuting a block of code.

GUI—an acronym for graphical user interface. This term refers to asoftware front-end meant to provide an attractive and easy-to-useinterface between a computer user and application.

intermediate language—in computer programming, a target language intowhich all or part of a single statement or a source program in a sourcelanguage is translated before it is further translated or interpreted.

loop—a set of instructions executed repeatedly as long as some conditionis met.

machine language—a binary language (using only 0s and 1s); the onlyprogramming language the computer understands. All programs written inhigher-level languages must be translated into machine language beforethey can be executed.

method—an operation defined for an object, implemented as a procedure orfunction in a programming language.

object—in object-oriented design or programming, a concrete realizationof a class that consists of data and the operations associated with thatdata.

object interation diagram—a diagram that shows the dynamicmessage-passing relationship between objects, including which objectowns the data being passed and which object owns the service beingcalled.

process—a process is a single executable module that runs concurrentlywith other executable modules.

sequence diagram—a diagram that shows object interactions arranged intime sequence. In particular, it shows the objects participating in theinteraction and the sequence of messages exchanged. Unlike acollaboration diagram, a sequence diagram includes time sequences butdoes not include object relationships.

shadow stack—a data structure that reflects the current contents of theprogram's call stack.

single step—a debugger command that allows an application program toexecute one line of the program, which can either be a single assemblylanguage instruction or a single high level language instruction. Thereare typically two distinct single step commands—one that will singlestep “into” subroutine calls and one that will step “over” them.

source code—the readable form of code created by a programmer in ahigh-level programming language. Source code is converted tomachine-language object code by a compiler or interpreter.

state diagram—a model of the states of an object and the events thatcause the object to change from one state to another.

thread—the basic unit of program execution. A process can have severalthreads running concurrently. Each thread can be performing a differentjob, such as waiting for events or performing a time-consuming task thatthe program does not need to complete before the program continues.Generally, when a thread has finished performing its task, the thread issuspended or destroyed.

UML—stands for unified modeling language. UML is a standard notation andmodeling technique for analyzing real-world objects, developing systems,designing software modules in object-oriented approach. UML has beenfostered and now is accepted as a standard by the group for creatingstandard architecture for object technology, OMG (Object ManagementGroup).

The present invention may include a system that can automaticallygenerate a diagram that illustrates the operational behavior of asoftware program. In one exemplary implementation, a system forsummarizing the operational behavior of a computer program mayautomatically generate a sequence diagram including notation forillustrating conditional execution and looping. Alternateimplementations may include generation of other behavioral views of thesystem, including collaboration diagrams, state diagrams, and objectinteraction diagrams.

Before explaining details of how conditional execution and looping canbe automatically identified, sequence diagrams will first be explained.A sequence diagram describes the interaction between a set of objectsfor a given scenario. FIG. 1 provides a simple example of a sequencediagram that illustrates the interaction between objects A, B, and C foran unspecified scenario. The objects are indicated in boxes 100, 105,and 108. The vertical line, such as vertical line 115, extending fromeach object is referred to as the object's lifeline, and it representsthe time for which the object exists in the system. Time flows from thetop of the diagram towards the bottom of the diagram, as indicated byarrow 120. The horizontal lines, such as line 125, between the objectlifelines represent messages sent (or methods called) from one object toanother. For example, horizontal line 125 shows that object A 100 hasinvoked the doX method of object B 105. Horizontal dashed line 130indicates that control has returned to object A.

The notation used in FIG. 1 is not sufficient to concisely illustratemore complex behavior which is typical of most programs. For example, inthe C++ code illustrated in FIG. 2, the method “someMethod” 200illustrates a simple “for” loop 210 that contains an “if-else” statement220 and 230. The “if” portion of the “if-else” statement calls doXmethod 240 of object b with a value of true, and the “else” portion ofthe “if-else” statement calls the doX method of object b with a value offalse. If the bFlag variable of the doX method is set to true, the doXmethod creates an object c of class C and calls doY. DoY prints thescreen output, “Hello World.” If the bflag variable of the doX method isset to false, no screen output is produced. As illustrated by the “if”statement indicated by reference number 220, the bFlag variable is onlyset to true on the fifth iteration of the “for” loop indicated byreference numeral 210. Thus, the result of executing the programillustrated in FIG. 2 will be the calling of the doX method with a valueof false in iterations 1-4 and 6-10 and the calling of the doY methodfrom the doX method in iteration 5.

While the operation of someMethod can be understood by examining thesource code, it would be difficult to truly understand the program flowusing the limited notation from FIG. 1. FIG. 3 attempts to illustratethe behavior of “someMethod” 200 using the limited notation from FIG. 1.The sequence diagram shown in FIG. 3 lacks notation to illustrate the“for” loop and the “if else” statement, yet these elements are requiredto accurately depict the true behavior of “someMethod” 200. In FIG. 3,the method doX is called ten times by someMethod of object A 300. In thefifth iteration of doX, the doX method of object B 310 calls the doYmethod of object C 320. However, there is no indication using thenotation in FIG. 3 of why doX was called ten times or why doY was calledin the fifth iteration of doX. Fortunately, the UML standard forsequence diagrams includes notation that can be used to representexecution loops and conditional execution. However, the UML standarddoes not specify a mechanism for generating such notation. As a result,the UML notation for loops and conditional execution has typically beengenerated manually by programmers.

The notation in FIG. 3 can be contrasted with the notation of FIG. 4,which is a UML sequence diagram of “someMethod” 200 that utilizesnotation for representing execution loops and conditional statements.The notation for representing execution loops and conditional statementsin FIG. 4 can be automatically generated using the methods, systems, andcomputer program products for analyzing operation behavior of a computerprogram described herein. Conventionally, such notation was required tobe generated by a programmer or software engineer through manualanalysis of source code and program output. Such a manual process islabor intensive and subject to human error.

In FIG. 4, “for” loop 210 from FIG. 2 is illustrated using loop operator400. The scope of the “for” from FIG. 2 is illustrated utilizing UMLnotation referred to as an interaction frame 405. Reference numeral 410indicates the guard condition for “for loop” 210. The “if else”statement 220 from FIG. 2 is depicted using interaction frame and theALT (short for alternative) operator 415. Horizontal dashed line 420provides a boundary between the “if” clause and the “else” clause of “ifelse” statement 220. Everything above dashed line 420 and within ALTinteraction frame 425 occurs when guard condition 430 evaluates to true.In the illustrated example, if guard condition 430 evaluates to true,object A calls b.doX with a value of true, as indicated by referencenumeral 435. b.doX then calls c.doY, as indicated by reference numeral440, when guard condition 445 is true. OPT interaction frame 450indicates that doY only executes if guard condition 445 is true. OPTinteraction frame 450 is equivalent to ALT interaction frame 425 withonly one option.

Everything below dashed line 420 and within ALT interaction frame 425occurs when guard condition 430 evaluates to false. In the illustratedexample, 430 is false, guard condition 455 is true and doX is calledwith a value of false, as indicated by reference numeral 460.

The sequence diagram in FIG. 4 provides a much more concise, andaccurate, depiction of the true behavior of the program than thesequence diagram in FIG. 3. One embodiment of the invention provides amethod of summarizing the execution of a computer program in such a wayas to enable generation of UML sequence diagrams containing loop andconditional execution notation as shown in FIG. 4. This method providesdetailed steps for monitoring the execution flow in order to detect andrecord situations in which a program is executing a block of statementsin a loop or when a program has conditionally executed a block ofstatements. Although this embodiment generates UML sequence diagrams toillustrate the execution flow of the program, alternative approaches maygenerate other behavior-oriented diagrams, such as object interactionand collaboration diagrams. In some situations it may be desired tooperate the SAE, which will be described in detail below, withoutgenerating a diagram. For example, the raw SAE output can be analyzedand compared to output from previous analysis sessions to verify modelcompliance and to highlight areas where the design has changed.

Most modern computer programs are written in a high level language,referred to as source code language. The syntax of these high levellanguages provides the ability to make function calls, to conditionallyexecute a block of code, and to execute the same block of code in a loopuntil some condition is met. The present invention may include analyzingthe execution flow by monitoring function/method calls made by aprogram. In addition, the present invention may include intelligentmonitoring that can detect when the program is in an execution loop orwhen a program has conditionally executed a block of code. Thisinformation may be used to summarize the behavior of a program byexpressing the function call sequences with loop and conditionalexecution notation, as illustrated in FIG. 4.

A function call allows a program to temporarily branch to anotherlocation to execute a series of statements and then to return the pointof origin and continue execution. Functions are blocks of code (computerstatements) that can be called in order to perform a specific task. Inobject-oriented programming, objects have actions, referred to asmethods, which can be invoked by other objects. A method invocation isanalogous to calling a function in a non-object-oriented computerlanguage. Most computer programs are composed of hundreds, if notthousands, of objects, each object having many methods. Inobject-oriented languages, methods are used to express the actions thatcan be performed by an object. Software engineers who are designingobject-oriented programs utilize sequence diagrams to document theinteraction between objects and the behavior of a program. In contrastto sequence diagrams, class diagrams are used to document the staticdesign of an object-oriented system. The present invention may includethe ability to monitor the interaction between the objects that comprisea computer program. Although one exemplary implementation describedherein focuses on analyzing object-oriented programs, the methods andsystems described herein for analyzing operational behavior of acomputer program can also be used to monitor the interaction of modulesand associated functions of a non-object-oriented application, such as aprocedure-oriented application.

In one implementation of the present invention, interactions betweenmethods or functions in a computer program may be recorded into a datastructure referred to herein as a summarized call tree, as illustratedin FIG. 5. A summarized call tree is a hierarchical representation ofpresent and past sequences of method calls made by a program. Eachbranch of the tree corresponds to method calls that were made duringexecution of the program. A summarized call tree differs from a standardcall tree because it maintains the state of method calls that were madewhile in an execution loop. The first instance of a method call madewhile executing in a loop will result in a single child nodecorresponding to the called method being added to the tree. Subsequentcalls to the method from within the execution loop are recorded byincrementing a local loop count associated with the call node. After theexecution analysis has ended, the summarized call tree can be used toidentify method calls that were made while in an execution loop. Theability to identify calls that were made in an execution loop iscritical to produce sequence diagrams that provide loop notation toexpress the behavior a system.

In FIG. 5, node 500 represents the root node or entry point method ofthe summarized call tree, which is named entryPoint. Node 510 representsmethodA being called from entryPoint and is added to the summarized calltree with a loop counter of 1 the first time that entryPoint callsmethodA. Node 520 represents methodB being called from methodA and isadded to the summarized call tree with a loop counter of 1 the firsttime that methodA calls method B. The purpose of the summarized calltree in FIG. 5 is to record enough information to track the context wheneach method is called so that sequence diagrams, such as the sequencediagram illustrated in FIG. 4, can be automatically generated withnotation to illustrate conditional execution and looping.

FIG. 6 illustrates another example of a summarized call tree. In FIG. 6,node 600 represents the entry point method, which is named entryPoint.Node 610 represents the first calling of methodA from entryPoint. Node620 represents methodB being called from methodA. In node 620, the loopcount associated with methodB is ten, indicating that methodB was calledten times by methodA. A mechanism for automatically generating asummarized call tree, such as those illustrated in FIGS. 5 and 6, andusing this information to summarize operational behavior of a computerprogram will be described in detail below.

As stated above, one important aspect of the present invention is theability to automatically identify conditionally executed blocks of code,referred to as conditional statements, and adding conditional executionnotation to sequence diagrams. Returning to FIG. 2, the exemplary sourcecode contains conditional statements. The first conditional statement isan “If” statement 220 that also contains an “else” clause 230. Thisconditional statement can be interpreted as saying “if variable i isequal to 5, then execute b.doX(true) 240 otherwise execute b.doX(false)250. ” The lowercase “b” in the b.doX( ) method call represents theobject “b” which is an instance of class “B.” The characters “doX” inthe method call represent the doX method of object B being called. Thevalue within the parenthesis, “true” or “false”, represents the valuebeing passed to the method doX. In the source code example, b.doX isonly passed a value of “true” on the fifth time through the enclosingexecution loop, see 240. A sequence diagram would depict the conditionalexecution of this statement by using ALT notation 415 to represent the“if, else” statement. However, using conventional software engineeringtools, such as diagrams are required to be manually generated.

The inclusion of conditional execution notation in a sequence diagramhelps to explain the conditions under which a method or a block ofmethods are called. A sequence analysis program, such as a sequenceanalysis engine according to an embodiment of the present invention,preferably captures the conditional execution flow so that generatedsequence diagrams can provide conditional execution notation.

In order to detect conditional execution, the SAE may record methodcalls between selected classes based on filter criteria established bythe user. Recording a method call involves recording both the callee andthe caller. The caller contains information about the origin of a call.The callee contains information about the destination of a call. Eachcall that is monitored may be placed in a summarized call tree which maybe later used to create visual representations of the execution flow.

In one exemplary implementation, the SAE performs post analysis of thecalls contained in the call summary tree. The main purpose of this postanalysis is to identify calls made from blocks of statements containedby a conditional statement. This is accomplished by analyzing themachine/intermediate language code for each caller contained in the calltree and locating low level computer instructions that were generatedfrom high level language conditional statements. These low levelinstructions are referred to as conditional branching instructionsbecause they allow the processor to jump over blocks of instructionsbased on some condition. Example conditional branch instructions thatthe SAE may analyze in detecting conditional execution may includemachine instructions, such as jz, jne, jge jump zero, jump not equal,jump greater than or equal). These instructions require a location tojump to if the condition is met.

FIG. 7 illustrates an example of how a high level “if” statement maybroken down into low level assembly statements suitable forpost-analysis of function calls to identify guard conditions and loopcounts according to an embodiment of the present invention. In FIG. 7,reference numeral 700 represents the original source code containing an“if” statement 710. In the source code, if i==5, the methodtestSubject.doSomething is executed, as indicated by reference numeral720. The block of code represented by reference numeral 730 shows howeach high level computer statement is broken down into one or more lowlevel assembly instructions. Assembly instructions 740 correspond to the“if” statement 710. Assembly instructions 750 contain the JNE (Jump NotEqual) assembly instruction. The computer will jump over theinstructions 750, which correspond to testSubject.doSomething, if thepreceding comparison operation indicates that the value stored in theESI register is not equal to five. In other words, the assembly code 750is conditionally executed, the condition being when i=5. The SAE maystore the scope of the conditionally executed assembly block 750 byrecording the location of the JNE instruction and jump target location760. In this example, the scope of the conditional block would be theopen range (0x17, 0x27). Any statements contained in this range would beidentified by the sequence analysis engine as being conditionallyexecuted. All calls in the call tree that have callers in the boundedarea are then flagged as belonging to the same conditionally executedblock of call statements.

In one exemplary implementation of the sequence analysis engine,debugger symbols can be used to locate the line(s) of source code thatcorrespond to the analyzed branch instructions. This information canthen be presented on a sequence diagram as depicted in FIG. 4 using ALTguards, such as 430 and 455 illustrated in FIG. 4. In some situations,debug symbols may not be available. In this case, the GUI application(described in detail below) may allow the user to manually annotate thegenerated diagrams.

FIG. 8 further illustrates an exemplary process that may be executed bya sequence analysis engine according to an embodiment of the presentinvention in determining the scope of conditional and loop statementsthat may contain one or more call statements. The process of determiningconditional or loop statement scope begins in step 800. Provided themethod has not already been analyzed (step 810), the first step in theprocess is to load the machine/intermediate code associated with themethod being analyzed (step 820). Next, the first instruction isdisassembled (step 830). In step 840, it is determined whether thecurrent intermediate instruction being analyzed is a forward conditionalbranch. If the instruction is a conditional branching instruction thathas a target offset greater than the offset of the branch instruction,then the branch is considered a forward conditional branch. In FIG. 7,the jne instruction located at offset 17 has a target offset of 27.Thus, the jne statement would be identified as a forward conditionalbranch.

Forward conditional branching is indicative of conditionally executedcode blocks. If the disassembled instruction is a forward conditionalbranch, then the starting and ending offset of the conditionallyexecuted block of code is stored for later use (step 850). If thecurrent instruction is not a forward conditional branch, controlproceeds to step 860 where it is determined whether the currentinstruction is a looping instruction. If the current instruction is abranch that specifies a target offset less than the offset of the branchoffset, then a loop has been detected. In this case, the starting andending offset of the loop are stored for later use (step 870).

After checking for the existence of forward conditional branches andbackward conditional branches or loops, the SAE advances to the locationof code where the next instruction is located (steps 880 and 890). Ifthe method contains additional instructions then the process describedabove is repeated starting at step 830. If all of the intermediatelanguage instructions have been analyzed, the process of checking forloops and conditional execution ends in step 895.

The conditional and loop statement scopes that are recorded in thisphase are later used when rendering diagrams, such as sequence diagrams,that illustrate the execution flow of the program. For example, in thecase of a sequence diagram, if a block of call statements are containedby or within the starting and ending offsets of a conditional statement,an interaction frame can be drawn around the method calls. Returning toFIG. 4, interaction frame 425 contains a method call 435. In addition todrawing the interaction frame, the sequence analysis engine may use thescope information stored for the conditional statement along with debugsymbols to add the guard notation containing the line of source codethat was associated with the low level forward conditional branch. InFIG. 4, reference numeral 430 represents the guard condition (i==5)associated with the jne statement in FIG. 7. Similarly, loop statementscope information may be used to render the loop interaction frame andloop guard condition. In FIG. 4, frame 400 is a loop interaction framerendered for the loop in the source code illustrated in FIG. 2.Reference numeral 410 illustrates the guard condition (for int i=1;I≦10; i++) for the loop. Such guard conditions are invaluable inillustrating the operational behavior of a program. Because the presentinvention is capable of automatically identifying execution loops andconditional execution and function calls make within the loops orconditionally executed blocks of code, sequence diagrams containingguard condition notation, such as that illustrated in FIG. 4 can beautomatically generated.

Most computer programs contain blocks of statements that are executedrepeatedly in an execution loop. The following pseudo-code to read eachline from a file and print it to the computer console illustrates thevalue of execution loops in computer programs: While NotEndOfFile(someFile) Do text = ReadLine(someFile) Console.Print text EndWhileWithout a looping construct, this program would be difficult, if notimpossible, to write. Since the programmer may not know ahead of timehow many lines of text are contained in the file, it would be impossibleto know how many ReadLine function calls to make in order to read alllines of text contained in the file. The “while” loop provides a concisemechanism for expressing which statements are to be executed in the loopand the condition that must be met in order to continue looping over thestatements.

Sequence diagrams provide notation for expressing method calls that aremade inside of an execution loop. In FIG. 4 an execution loop isexpressed in UML 2.0 notation. Loop operator 400 specifies that allfunction calls made within the interaction frame 405 are to be repeatedin an execution loop while the guard condition 410 is met.

The present invention may include a method for summarizing the executionflow of a computer program by identifying method calls that were made inthe context of an execution loop. This enables the generation ofsequence diagrams such as in FIG. 4 that concisely describe the behaviorof a computer program using loop notation.

As described above, a summarized call tree contains nodes that representunique execution paths made by a computer program. If the same path isencountered more than once, a loop counter is incremented at the pointin the tree where the path was repeated. For example, if the callsequence:entryPoint→methodA→method Boccurs, a sequence analysis engine may store this information in asummarized call tree as illustrated in FIG. 5. The summarized call treestores the method call information in a tree data structure. Each nodeof the tree represents a method call. The directional arrow representsthe relationship between the parent call to a child call. A parent callrepresents a method that executed the child call. The summarized calltree differs from a normal call tree because in a summarized call tree aloop count is incremented each to time the call occurs; whereas, in anormal call tree a call object is stored for each call. Each call inFIG. 5 has a loopCount equal to 1. This indicates that the calls haveonly been called once and therefore have not been called in a loop.

As described above, FIG. 6 illustrates the summarized call tree ifmethodB was called by methodA 10 times while in an execution loop. Inthis case, the loopCount associated with methodB has the value 10. SinceentryPoint and methodA each have loopCount set to 1 we know that methodBwas called 10 times by methodA. In addition, we know that it wasexecuted in a loop, as opposed to individual sequential calls, since thecall information stored in the summarized call tree represents an exactlocation within the method that executed the call.

One problem that must be overcome is to identify where the executionloop actually exists. It is not enough to know that the sequenceentryPoint→methodA→methodB was encountered 10 times because entryPointcould be calling methodA in a loop, methodA could be calling methodB ina loop, or both entryPoint and methodA could contain execution loops.The present invention includes a mechanism to overcome this problem. Inone exemplary implementation, the present invention may utilize a datastructure, referred to herein as a shadow stack, may be to record thenumber of times a method call was made by the callee while the callee orcalling function was on the call stack, i.e., before the callingfunction returns. The shadow stack may contain a snapshot of theexecuting program's call stack at a specific time. The SAE maintains theshadow stack by actively monitoring method calls and method returns.When a new method call is monitored, the information about the call ispushed onto the top of the shadow stack. When the call returns to thecallee, the corresponding entry is popped from the shadow stack. Thecall information is stored in an object referred to herein as a callobject. The call object may be stored or encapsulated in another object,referred to herein as a StackFrame object. The StackFrame object may bestored on the shadow stack.

FIG. 9 is a UML class diagram that illustrates the class relationshipbetween a shadow stack 900, a StackFrame 910, and a call 930. Shadowstack 900 is a stack data structure that contains zero or moreStackFrame objects 910. Each StackFrame object 910 has a reference to acall object 920. A call object 920 can contain zero or more call objectchildren. In addition, each call object 920 contains references to acaller object 930 and a callee object 940. A caller object represents amethod that originated the call. In addition, caller object 940 containsthe instruction pointer of the next machine/intermediate languagestatement to execute upon return of the call. Callee object 930represents the method that is being called by the callee. The calleerefers to the object that contains the called method.

In one exemplary implementation, the SAE places a reference to the callobject into the summarized call tree, described above, when the callobject is created and placed in a frame of the shadow stack. Thesummarized call tree maintains a history of each method call made by theprogram. The shadow stack maintains a history of each call that iscurrently represented on the program's call stack. FIG. 10 illustratesthe relationship between shadow stack 900 and a summarized call tree1000. Shadow stack 900 contains a collection of StackFrame objects 1010,1020, and 1030. Each StackFrame object 1010, 1020, and 1030 representsthe context of a method that is in the process of being executed. EachStackFrame object 1010,1020, and 1030 has a reference to a call object.Summarized call tree 1000 also contains references to call objects thatare currently on shadow stack 900, as indicated by reference numerals1040, 1050, and 1060. Summarized call tree 1000 also contains referencesto call objects that were previously on shadow stock 900, as indicatedby reference numerals 1075-1095.

As new calls occur, corresponding call objects are encapsulated by aStackFrame object which is in turn pushed onto shadow stack 900. When acall returns to the originating method, the corresponding StackFrameobject is popped from shadow stack 900 and then discarded. The maindistinction between the roles of summarized call tree 1000 and shadowstack 900 is that summarized call tree 1000 maintains a historiccollection of all calls that have been monitored by the system, whereasthe shadow stack 900 references only those calls that are “active” onthe monitored program's call stack.

In FIG. 10, stack frame object 1030 represents the first function callmade by the program that has not yet returned made. This information isrecorded in summarized call tree 1000 by call object 1040. When anotherfunction is called from the function corresponding to the stack frameobject 1030, stack frame object 1020 is added to shadow stack 900.Similarly, call object 1050 is added to summarized call tree 1000. Whenthe next function is called from the function that corresponds to stackframe object 1020, stack frame object 1010 is added to shadow stack 900.Similarly, call object 1060 is added to summarized call tree 1070. Whenthe function that corresponds to stack frame object 1010 returns, it isremoved from shadow stack 900. However, call object 1060 will remain insummarized call tree 1000. Thus, summarized call tree 1000 stores ahistory of past instances of shadow stack 900.

After the function corresponding to stack frame object 1010 returns, thenext function called within the function that corresponds to stack frameobject 1020 will be added to the branch of summarized call tree 1000after call object 1050. For example, the next call may be indicated bycall object 1095. Thus, by maintaining a shadow stack that containsobjects corresponding to functions that have not returned and asummarized call tree that represents a history of functions that havebeen called and context between the function calls, the presentinvention allows automatic generation of summarized computer programinformation.

Using a shadow stack allows the SAE to determine where the executionloop is located. As described above, it is not enough to simply countthe number of occurrences of a particular call. In order to accuratelydepict the flow of execution, the SAE preferably determines where theloop (or loops) occurred that resulted in multiple occurrences of thecall. Maintaining a shadow stack allows the SAE to keep track of thelocal loop count for each call that is active on the monitored program'scall stack.

As described above, the present invention may utilize debugger servicesto set breakpoints in the monitored program. Breakpoints are specialinstructions that, when executed, cause a debug exception to occur.Debuggers intercept these exceptions and use them to gain control of thecurrently executing program. Having suspended execution of the program,the debugger can inspect the content of the program, including the callstack, threads, local variables, computer registers, and the contents ofaddressable memory. The SAE uses debugger breakpoints to gain control ofa program at predetermined locations, referred to herein as sequencepoints. A sequence point is established by setting a function breakpointon the first instruction of a method. The SAE utilizes sequence pointsto focus analysis on interactions between select classes and/or methods.Although one exemplary implementation of the invention utilizes debugbreakpoints to gain control of a program, alternative approaches couldalso be used. For example, the SAE may overwrite instructions in thetarget application with special instructions (such as a kernel modefunction call) that would result in the SAE gaining control of theapplication. The overwritten instructions would be restored once the SAEhas finished processing the exception.

When a sequence point breakpoint occurs, the SAE updates the shadowstack and the summarized call tree with information from the monitoredprogram's call stack. After the shadow stack has been updated, the SAEutilizes the local loop count attribute of the stack frame to detect theorigin of execution loops that involve methods that are active on themonitored program's call stack. The final step taken by the SAE whenprocessing a sequence point breakpoint is to scan the current method forcall instructions, if it has not already been scanned, and to setbreakpoints on the scanned call instructions. These breakpoints arereferred to herein as call points.

FIG. 11 illustrates exemplary overall steps that may be performed by asequence analysis engine in using debugger services to control programexecution and build the summarized call tree and shadow stack datastructures according to an embodiment of the present invention.Referring to FIG. 11, in step 1100, execution of a program is suspendeddue to a breakpoint event. If the breakpoint event is caused by abreakpoint set by the user, control proceeds to steps 1110 and 1115where it is determined whether the breakpoint is a sequence point. Asdescribed above, a sequence point is a function breakpoint defined bythe user at the first instruction of a method. If the breakpoint isdetermined to be a sequence point, control proceeds to step 1120 wherethe shadow stack and the summarized call tree are updated. Exemplarysteps for updating the shadow stack and the summarized call tree will bedescribed below with regard to FIG. 12.

In step 1125, the sequence analysis engine detects loops. This step maybe performed using the shadow stack, the summarized call tree, and thelocal loop counter for each call currently on the monitored program'scall stack. Detecting loops also includes recording the origin of eachloop using the summarized call tree, as described above. In step 1130,the sequence analysis engine determines whether it is within a maximumcall depth from the nearest sequence point. If it is determined that theanalysis is within the maximum call depth, then the sequence analysisengine may continue exploring interactions between function or methodcalls by setting call breakpoints on all statements in the currentmethod. If the sequence analysis engine is not within the maximum calldepth, then execution of the program should resume. Maximum call depthmay be programmable by the user depending on the desired depth ofanalysis desired by the user. Accordingly, if it is determined whetherthe maximum call depth has not been exceeded, control proceeds to step1135 where the SAE begins the process of scanning the current method forcall instructions by determining whether the current call in the currentmethod has already been scanned. If the call has not been scanned,control proceeds to step 1140 where the method is scanned for calls. Instep 1145, call breakpoints are set at each detected call statement sothat program execution can be halted at each call statement andrelationships between function calls can be determined. Breakpoints thatare automatically set by the SAE at calls within a method being analyzedare referred to herein as call points. Control then proceeds to step1150 where program execution is resumed.

Returning to step 1115, if a breakpoint is determined not to be asequence point, control proceeds to step 1155 where it is determinedwhether the breakpoint corresponds to a return point. A return point isa breakpoint set at the instruction in a function that causes thefunction to return. If the instruction is determined to be a returnpoint, control proceeds to step 1160 where it is determined whether thecurrent thread of execution is a valid thread. If the current thread ofexecution is a valid thread, control proceeds to step 1163 where callpoints or call breakpoints in the current method are disabled. In step1165 the shadow stack frame containing the current function that isreturning is removed or popped from the shadow stack. Execution of theprogram then resumes at step 1150.

Returning to step 1155, if the current breakpoint is determined not tobe a return point, control proceeds to step 1170 where it is determinedwhether the current breakpoint is a call point. As described above, acall point may correspond to a function that is desired to be steppedinto in order to analyze relationships between calls. Call points may beautomatically set by the SAE, as indicated by step 1145. If thebreakpoint is determined to be a call point, control proceeds to step1175 where the debugger step in services used to step into the functioncorresponding to the call point. Once the step-in has been performed,control returns to step 1150 where program execution is resumed.

Returning to step 1100, if program execution is halted due to abreakpoint event and the breakpoint event is a step in complete event,control proceeds to step 1180 where analysis of the stepped-intofunction begins. In the stepped-into function, the SAE first determinesin step 1185 whether a sequence point is present in the function. If asequence point is present, control proceeds to step 1150 where executionof the program is resumed. The program will be suspended when thesequence point breakpoint is encountered. If the stepped-into functiondoes not include a sequence point, control proceeds to step 1120 wherethe shadow stack and the summarized call tree are updated. Steps1125-1145 may be repeated to detect loops, and automatically set callpoints within the stepped-into function, provided that the maximum calldepth has not been exceeded. Thus, using these steps, interactionsbetween multiple layers of function calls may be automatically analyzed.Once analysis of the stepped-into function is complete, execution of theprogram is resumed.

Maintaining a shadow stack and a summarized call tree is an importantstep in summarizing the execution flow of a computer program. Therefore,it is appropriate to further discuss the process of building andmaintaining the shadow stack and the summarized call tree. Asillustrated by step 1120 in FIG. 11, the SAE updates the shadow stackand the summarized call tree while processing sequence points and whileprocessing the step-in complete debug event. The first step in theprocess is to walk back (from top to bottom) on the program's call stackuntil the current stack frame matches the top frame of the shadow stack.A match is found when both stack frames refer to the same method. Afterthis step, the shadow stack and the program's call stack are consideredto be synchronized. In the next step, the SAE walks forward (toward thetop) on the call stack and creates a call object from the program'scurrent stack frame. This call object is used to perform a lookupoperation in the summarized call tree. If the summarized call treecurrently contains a matching call, the SAE resets the local loop countfor all child call objects of the matching call. Resetting the localloop count for child calls is in important step because it allows theSAE to determine the origin of execution loops. If the summarized calltree does not currently contain a call object matching the callassociated with the program's stack frame, then the method call hasnever been made in the current context and the call object must be addedto the summarized shadow stack. Adding the call to the shadow stackinvolves adding it as a child of the call object at the top of theshadow stack. If the call object was found in the summarized call tree,the matching call object is resurrected from the summarized call treeand pushed onto the top of the shadow stack. If the call object was notfound in the summarized call tree, newly created Call object is pushedonto the top of the shadow stack. The whole process, described above,repeats until the SAE has processed the top stack frame in the monitoredprogram's call stack.

FIG. 12 is a flow chart illustrating an exemplary process for updatingthe shadow stack and the summarized call tree. Referring to FIG. 12, theprocess of updating the shadow stack and the summarized call tree beginsin step 1200. In step 1205, it is determined whether the shadow stack isempty. If the shadow stack is empty, control proceeds to step 1210 whereSAE walks to the bottom of the program's call stack. Since the bottom ofthe program's call stack corresponds to the current instruction and theshadow stack is empty, the shadow stack and the program stack aresynched. In step 1205, if the shadow stack is not empty, controlproceeds to step 1215, where the SAE walks back from the bottom of theprogram stack until the SAE arrives at the instruction corresponding tothe current state of the shadow stack.

Once the program stack and the shadow stack are synched, controlproceeds to step 1220 where the SAE walks forward one position in theprogram stack. In step 1225, the SAE determines whether the currentposition is the top of the program stack. If the current position is thetop of the program stack, there is no need to update the shadow stackbecause the program stack does not include any further instructions thatare not already in the shadow stack. Accordingly, control proceeds tostep 1230 where the process of updating the shadow stack completes.

If, however, the current position is not the top of the shadow stack,there are instructions in the call stack that have not been placed inthe shadow stack. Accordingly, control proceeds to step 1235 where acall object is created based on the current program stack entry. Thecall object is used to perform a lookup in the summarized call tree. Instep 1245, the SAE determines whether a match for the current call isfound in the summarized call tree. If a match is found, control proceedsto step 1250 where the SAE sets the local loop count to zero for allchild call objects matching the call object. In step 1255, the callobject is pushed on to the shadow stack. In step 1260, the SAE sets abreakpoint at the return address of the function call. In step 1265, theSAE walks forward one position in the call stack. Control then returnsto step 1225.

Returning to step 1245, if the call is not found in the summarized calltree, control proceed to step 1270 where the call object is added to thesummarized call tree. Instep 1275, the call object is added as a childof the call object at the top of the shadow stack. Control then returnsto step 1255 where the call object is pushed on to the shadow stack,step 1260 where a breakpoint is set at the return of the current call,and step 1265 where the next instruction in the call stack is accessed.

Detecting the origin of execution loops is crucial to accuratelysummarize the behavior of a computer program. As illustrated by step1125 in FIG. 11, the SAE may perform loop detection while processingsequence point breakpoints and while processing step-in-completedebugger events. FIG. 13 illustrates exemplary steps for detecting theorigin of execution loops in the monitored program according to anembodiment of the present invention. The approach involves iteratingfrom the top of the shadow stack toward the bottom while incrementingthe LocalLoopCount for each call and then checking to see if theLocalLoopCount is greater than one. If the LocalLoopCount is greaterthan one, then the SAE sets the LocalLoopDetected attribute of the Callto true and the loop detection process is complete. The key to makingthe loop detection function work is combining the use of the shadowstack with the summarized call tree in order to detect whether or not achild call was called by its parent Call (in the summarized call tree)more than once while the parent call was active on the shadow stack. Toaccomplish this, the LocalLoopCount for the call is reset to zerowhenever the parent call is resurrected from the summarized call treeand placed on the shadow stack (see FIG. 12, block 1250). The logic isas follows: if the parent call has been resurrected from the call treethen all child call objects of the parent call have never been calledbefore in the current context. Therefore, it is accurate to set theLocalLoopCount to zero to indicate that the call has not been executedyet. The SAE preferably does not reset the Call object'sLocalLoopDetected flag. Once this flag has been set to true, itpreferably remains true. This allows the SAE to determine if the callhas ever been called in the context of an execution loop.

The present invention also utilizes breakpoints to establish twoadditional types of breakpoint locations. These breakpoint locations asreturn points and call points, as described above with respect to FIG.11. Return Points are established at the return address of a method calland are used to synchronize the shadow stack with the call stack of theprogram being monitored. When the SAE encounters a return point, ashadow stack frame is popped from the top of the shadow stack. This stepsynchronizes the program's call stack with the shadow stack that ismaintained by the SAE.

As described above with respect to FIG. 11, call points are establishedat the location of a call instruction within a given method. The SAEutilizes call points combined with the debugger step-in function toenable automatic discovery of interactions the current object has withother objects or methods. When a sequence point is encountered, thecurrent method is scanned for call instructions. A call point isestablished by setting a breakpoint at the location of each callstatement found during the scan. When the SAE continues execution of themonitored program, a breakpoint event will occur when the programexecutes an instruction located at one of the call points. If the SAEdetects that a breakpoint has occurred at a call point, the SAEinitiates the debugger Step-in service to step into the functionidentified by the call instruction. The SAE then allows the monitoredprogram to continue. A Step-Complete event will occur once the programhas successfully branched to the location of the function. When the SAEhas determined that the current event is Step-Complete, it will buildthe shadow stack and perform the loop detection as during the processingof a sequence point, described earlier.

One of the most important aspects of summarizing the operationalbehavior of a computer program is detecting when a function is calledfrom within an execution loop, and detecting the node in the summarizedcall tree corresponding to the sequence of functions in which theexecution loop occurred. This sequence of functions is referred toherein as the origin of the execution loop. FIG. 13 illustrates anexemplary process for detecting the origin of an execution loopaccording to an embodiment of the present invention. Referring to FIG.13, in step 1300, the process of detecting the origin of an executionloop call begins. In step 1310, the SAE determines whether the currentinstruction is located at the bottom of the shadow stack. If the currentinstruction is at the bottom of the shadow stack, it is either the entrypoint of the analysis or its origin or originating function has alreadybeen recorded. Accordingly, control proceeds to step 1320 where theprocess of determining the origin of a loop ends.

Returning to step 1310, if the current instruction is not at the bottomof the shadow stack, in step 1330, the loop counter in the call objectassociated with the function is incremented. In step 1340, it isdetermined whether the loop counter associated with the current call isgreater than one. If the loop counter is greater than one, the SAE setsthe boolean variable in the call object call.local loop to true,indicating that the current call is being called from within anexecution loop. In step 1340, if the loop counter is not greater than 1,control proceeds to step 1360 where the next stack frame in the shadowstack is analyzed.

In one implementation, the present invention includes a system that cansummarize the execution flow of a computer program. FIG. 14 is a blockdiagram illustrating an exemplary architecture for summarizingoperational behavior of a computer program according to an embodiment ofthe present invention. As illustrated in FIG. 14 one exemplary acombination of a GUI application 1400, SAE 1410, and debugger services1420 to monitor the execution flow of a target application 1430. GUIapplication 1400 may be any suitable type of GUI application forcontrolling the operation of an underlying program, such as sequenceanalysis engine 1410. In one example, GUI application 1400 may be awindows based GUI application. Sequence analysis engine 1410 may also bea software application configured to control target application 1430using the debugger API 1415 to access debugger 1420. Sequence analysisengine 1410 may implement the steps described above with regard to FIGS.8, 11, 12, and 13 to analyze and summarize operational behavior oftarget application 1430. Target application 1430 may be any suitabletarget application desired to be analyzed. Target application 1430 maybe written using an object-oriented language or a procedure-orientedlanguage.

In operation, GUI application 1400 may allow a user to configure inputsfor the SAE 710, control the operation of SAE 1410 (start, stop, pause),and to view and manipulate the outputs from SAE 1430. SAE 1410 interactswith debugger 1420 to control target application 1430 and examine/recordits execution. The services of debugger 1420 are accessed by SAE 1410 byutilizing debugger API 1415. It should be noted that debugger APIs areoften provided by debugger frameworks in order to offer customizeddebugging capabilities.

SAE 1410 may utilize debugger services 1420 in order to examine andrecord the execution flow of a computer program. While most debuggerapplications are used by computer programmers to isolate and fix bugs incomputer programs, it is possible to automate the use of debuggerservices to control, examine, and record the execution of a computerprogram. It is this automated use that allows SAE 1410 to step intomethods, scan for call statements, and set new breakpoints at the callstatements as described above with respect to FIG. 11.

In performing automated analysis of target application 1430, SAE 1410may utilize services that are commonly offered by debuggers. FIG. 15illustrates exemplary debugger services that may be accessed by SAE1410. SAE 710 uses the debugger API 1415 to access debugger services1500, including services 1505 for inspecting the state of a program,services 1510 for setting and clearing breakpoints, services 1520 forsingle stepping into/over computer instructions, services 1530 forcontrolling the execution of threads and processes, and services 1540for accessing debug symbols.

SAE 1410 may utilize debugger services to inspect the program state,including the program call stack, local and global variables, and thestate of computer registers. Of particular interest is the data that isstored on the call stack. The call stack is composed of stack frames.Each stack frame relates to a method that is currently being executed bythe program. Information in the stack frame includes local variables andthe return address of the next statement to execute once the methodbeing called has returned. SAE 1410 may walk the call stack and examinethe contents of each stack frame. SAE 1410 stores the return addressstored in the stack frame when it detects a new call. This returnaddress can be used to identify the memory address of the callstatement.

Another debugger service that is utilized extensively by SAE 1410 is thebreakpoint service 1510. SAE 1410 utilizes the breakpoint service tocreate, enable, and disable breakpoints. A breakpoint is a specialcomputer instruction that halts the current program and gives control todebugger 1420. In most cases debugger 1420 actually replaces a specifiedcomputer instruction with a special instruction that causes a breakpointexception to occur. When a breakpoint exception occurs, debugger 1420catches the exception and suspends execution of the application. When adebug breakpoint is encountered most debuggers allow the user tovisually inspect the state of the suspended program, including theregisters, memory, local/global variables, and the call stack. Asdescribed above, SAE 1410 uses debugger breakpoints to gain control of aprogram at predetermined locations referred to herein as sequencepoints. A sequence point is established by setting a function breakpointon the first instruction of a method. SAE 1410 utilizes sequence pointsto focus analysis on interactions between select classes and/or methods.SAE 1410 is notified by the debugger service whenever a debug breakpointis encountered. After analyzing the current call stack and recording theexecution flow, SAE 1410 uses the debugger services to resume executionof the suspended program.

SAE 1410 may utilize single stepping services to explore interactionsfor select classes/methods. Debugger 1420 provides services that allowSAE 1410 to control the execution of a single instruction or of a rangeof instructions. The Step-In service allows SAE 1410 to step into afunction that is referenced by a call instruction. As described above,SAE 1410 utilizes the Step-In service to analyze a function that isreferenced by a call point, see FIG. 11, step 1170.

SAE 1410 utilize the Suspend service to temporarily pause a program'sthread in order to inspect program state and set required breakpoints.SAE 1410 may utilize the Resume service to resume a suspended threadwhen analysis is complete.

In addition to using debugger services to control program execution, SAE1410 may also utilizes debugger services for reading program symbols.These symbols allow SAE 1410 to identify the memory locations offunctions specified by the user and associate low levelmachine/intermediate instructions to higher level source codestatements. This mapping allows SAE 1410 to annotate the sequencediagrams with fragments of source code. The ability to display sourcecode in generated sequence diagrams greatly enhances the diagram'sability to summarize the execution flow of a computer program.

FIG. 16 is a block diagram illustrating operational relationshipsbetween GUI application 1400, SAE 1410, and target application 1430. Asillustrated in FIG. 16, GUI application 1400 may allow the user tospecify SAE configuration inputs 1600, such as the target applicationbeing analyzed, the classes/methods to include in the analysis,classes/methods to ignore, the maximum call depth for monitoringfunctions calls, and a flag which indicates whether or notauto-discovery mode is enabled. GUI application 1400 may also allow theuser to control the analysis session, including the ability to start,stop, pause, and resume analysis. To start analysis of the targetapplication, the GUI Application sends a command to SAE 1410 to informit that it is time to begin analysis. SAE 1410 reads the inputs andbegins analysis of the target application. When analysis is complete, orwhen analysis is stopped by the user, SAE 1410 will output the resultsof the analysis, including a summarized call tree for each analyzedthread of execution. SAE output 1610 is subsequently loaded by the GUIapplication 1400 and diagrams depicting the flow of execution aregenerated and displayed in graphical form.

FIG. 17 is a flow chart illustrating exemplary steps that may beperformed by a user in initiating analysis of a target applicationaccording to an embodiment of the present invention. Referring to FIG.17, in step 1700, a user enters configuration information, such as thetarget application being monitored, classes or methods to inspect,classes or methods to ignore, maximum call depth, and whether autodiscovery mode will be enabled. In step 1705, the user starts theanalysis by selecting a start menu button. In step 1710, in response tothe start menu button from the user, GUI application 1400 sends a startmessage to SAE 1410. In step 1715, SAE 1410 receives the start message.In step 1720, SAE 1410 reads configuration information 1600 entered bythe user in step 1700. In step 1725, SAE 1410 loads the targetapplication.

In order to analyze the target application, in step 1730, SAE 1410 setsentry breakpoints according to the user configuration. In step 1735, SAE1410 analyzes the target application using the steps described abovewith regard to FIGS. 8, 11, 12, and 13. The present invention is notlimited to starting analysis of a target application in response to acommand received from a user. In an alternate implementation, SAE 1410may start analysis of the target application using a user-definedtrigger event, such as when a variable reaches a value specified by theuser. When the trigger event occurs, the SAE may enable all breakpointsspecified by the user.

FIG. 18 illustrates exemplary steps that may be performed by a user instopping analysis of a target application. Referring to FIG. 18, in step1800, a user stops the analysis, for example, by pressing a stop menubutton provided by GUI 1400. In step 1805, GUI application 1400 sends astop message to SAE 1410. In step 1810, SAE 1410 receives the stopmessage from GUI 1400. In step 1815, SAE 1410 stops the analysis of thetarget application. In step 1820, SAE 1410 writes output data to SAEoutput 1610. Exemplary output may include method calls and a context foreach method call, as described above. In step 1825, GUI 1400 reads SAEoutput 1610. In step 1830, SAE 1410 generates a diagram that summarizesthe execution of the program, such as a sequence diagram.

Thus, the present invention includes methods, systems, and computerprogram products for summarizing the operational behavior of a computerprogram. The method may include setting execution breakpoints atfunctions of interest in computer program code. The computer programcode is then executed to analyze the operational behavior of thecomputer program. During execution of the computer program, conditionalexecution and looping of each function of interest are tracked. Asummary of the conditional execution and looping is produced anddisplayed to the user.

It will be understood that various details of the invention may bechanged without departing from the scope of the invention. Furthermore,the foregoing description is for the purpose of illustration only, andnot for the purpose of limitation—the invention being defined by theclaims.

1. A method for summarizing operational behavior of a computer program,the method comprising: (a) executing a computer program in a mode thatallows control over execution of the computer program; (b) pausingexecution of program at predetermined locations corresponding toinstructions in the computer program; and (c) for each location: (i)recording contents of a call stack containing function calls made by theprogram that have not yet returned; and (ii) for each function call inthe call stack, recording conditions under which the function wascalled.
 2. The method of claim 1 wherein executing the computer programin a mode that allows control over execution of the computer programincludes executing the computer program under control of debuggerservices.
 3. The method of claim 1 wherein executing the computerprogram in a mode that allows control over execution of the computerprogram includes executing the computer program under control ofprofiler services.
 4. The method of claim 1 wherein executing thecomputer program in a mode that allows control over execution of thecomputer program includes inserting statements in the computer programto pause execution of the computer program and executing the computerprogram under control of the operating system.
 5. The method of claim 1wherein pausing execution of the program at predetermined locationscorresponding to instructions in the computer program includes pausingthe computer program at breakpoints set by a user at function calls ormethod calls of interest in the computer program.
 6. The method of claim1 wherein pausing execution of the program at predetermined locationscorresponding to instructions in the computer program includes usingdebugger services to automatically set breakpoints at call statementswithin a function in the computer program and pausing execution of thecomputer program at the automatically-set breakpoints.
 7. The method ofclaim 1 wherein recording contents of a call stack includes creating acall stack object for each function call in the call stack, creating ashadow stack, and storing the call stack objects in the shadow stack. 8.The method of claim 7 wherein recording conditions under which eachfunction was called includes creating a summarized call tree, andstoring indicators of each call object in the summarized call tree. 9.The method of claim 8 comprising identifying whether each call in theshadow stack is called in an execution loop and the scope of theexecution loop by maintaining a local loop counter for each call objectand incrementing the local loop counter for each occurrence of each callobject in the shadow stack.
 10. The method of claim 9 comprisingapplying post processing to each call object in the summarized call treeto identify conditionally executed blocks of code and execution loops.11. The method of claim 10 comprising summarizing the operationalbehavior of the program in a format that includes notation foridentifying conditionally executed blocks of code and execution loops.12. The method of claim 11 wherein the format includes a sequencediagram.
 13. The method of claim 12 comprising using debug symbols tosupplement the sequence diagram with loop and conditional notationcontaining fragments of source code from the computer program thatcorresponds to intermediate instructions that indicate a loop orconditional statement.
 14. A system for summarizing the operationalbehavior of a computer program, the system comprising: (a) a first datastructure for storing information corresponding to calls currently on acall stack of a target computer program being executed; (b) a seconddata structure for storing a history of function calls made by thecomputer program and contexts in which the function calls were made; and(c) a sequence analysis engine for controlling execution of the computerprogram, for pausing the execution at predetermined locations in thecomputer program, and, at each location, for storing contents of thecomputer programs call stack in the first data structure and updatingthe second data structure based on the contents of the computerprogram's call stack.
 15. The system of claim 14 wherein the first datastructure comprises a shadow stack for storing call object indicatorsfor calls currently on the program's call stack.
 16. The system of claim15 wherein each call object referred in the shadow stack stores callerand callee information for the function call and a local loop counterfor each function call.
 17. The system of claim 14 wherein the seconddata structure comprises a call tree indicating parent-childrelationships between calls currently in the call stack.
 18. The systemof claim 14 wherein the sequence analysis engine is adapted to usedebugger services to control execution of the program.
 19. The system ofclaim 14 wherein the sequence analysis engine is adapted to use profilerservices to control execution of the computer program.
 20. The system ofclaim 14 wherein the sequence analysis engine is adapted to useinstructions embedded in source code of the computer program to controlexecution of the computer program to control execution of the computerprogram.
 21. The system of claim 14 wherein the sequence analysis engineis adapted to pause execution of the computer program at breakpoints setby a user and update contents of the first and second data structuresfor each breakpoint.
 22. The system of claim 14 wherein the sequenceanalysis engine is adapted to automatically set breakpoints at callstatements in a function being analyzed and to use debugger step-inservices to determine relationships between the call statements and thefunction being analyzed.
 23. The system of claim 14 wherein the sequenceanalysis engine is adapted to analyze intermediate language code todetect blocks of code that are conditionally executed.
 24. The system ofclaim 14 wherein the sequence analysis engine is adapted to analyzeintermediate language code in the computer program to detect groups ofcalls corresponding to the same execution loop.
 25. The system of claim14 wherein the sequence analysis engine is adapted to analyze machinelanguage code to detect blocks of code that are conditionally executed.26. The system of claim 14 wherein the sequence analysis engine isadapted to analyze machine language code in the computer program todetect groups of calls corresponding to the same execution loop.
 27. Thesystem of claim 26 wherein the sequence analysis engine is adapted togenerate a behavioral diagram indicating the conditionally executedblocks of code and loops.
 28. The system of claim 27 wherein thesequence analysis engine is adapted to use debug symbols to supplementthe behavioral diagram with loop and conditional notation containingfragments of source code corresponding to guard conditions for the loopsor conditional notation.
 29. The system of claim 28 wherein thebehavioral diagram comprises a sequence diagram.
 30. A computer programproduct comprising computer-executable instructions embodied in acomputer-readable medium for performing steps comprising: (a) executinga computer program in a mode that allows control over execution of thecomputer program; (b) pausing execution of program at predeterminedlocations corresponding to instructions in the computer program; and (c)for each location: (i) recording contents of a call stack containingfunction calls made by the program that have not yet returned; and (ii)for each function call in the call stack, recording conditions underwhich the function was called.
 31. The computer program product of claim30 wherein executing the computer program in a mode that allows controlover execution of the computer program includes executing the computerprogram under control of debugger services.
 32. The computer programproduct of claim 30 wherein executing the computer program in a modethat allows control over execution of the computer program includesexecuting the computer program under control of profiler services. 33.The computer program product of claim 30 wherein executing the computerprogram in a mode that control over execution of the computer programincludes inserting statements in the computer program to pause executionof the computer program and executing the computer program under controlof the operating system.
 34. The computer program product of claim 30wherein pausing execution of the program at predetermined locationscorresponding to instructions in the computer program includes pausingthe computer program at breakpoints set by a user at function calls ormethod calls of interest in the computer program.
 35. The computerprogram product of claim 30 wherein pausing execution of the program atpredetermined locations corresponding to instructions in the computerprogram includes using debugger services to automatically setbreakpoints at call statements within a function in the computer programand pausing execution of the computer program at the automatically-setbreakpoints.
 36. The computer program product of claim 30 whereinrecording contents of a call stack includes creating a call stack objectfor each function call in the call stack, creating a shadow stack, andstoring the call stack objects in the shadow stack.
 37. The computerprogram product of claim 36 wherein recording conditions under whicheach function was called includes creating a summarized call tree, andstoring indicators of each call object in the summarized call tree. 38.The computer program product of claim 37 comprising identifying whethereach call in the shadow stack is called in an execution loop and thescope of the execution loop by maintaining a local loop counter for eachcall object and incrementing the local loop counter for each occurrenceof each call object in the shadow stack.
 39. The computer programproduct of claim 38 comprising applying post processing to each callobject in the summarized call tree to identify conditionally executedblocks of code and execution loops.
 40. The computer program product ofclaim 39 comprising summarizing the operational behavior of the programin a format that includes notation for identifying conditionallyexecuted blocks of code and execution loops.
 41. The computer programproduct of claim 38 wherein the format includes a sequence diagram. 42.The computer program product of claim 40 comprising using debug symbolsto supplement the sequence diagram with loop and conditional notationcontaining fragments of source code from the computer program thatcorresponds to intermediate instructions that indicate a loop orconditional statement.